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Abstract 

The topological filtration of interacting RNA complexes is studied and the 

role is analyzed of certain diagrams called irreducible shadows, which form suitable 

building blocks for more general structures. We prove that for two interacting RNAs, 

called interaction structures, there exist for fixed genus only finitely many irreducible 

shadows. This implies that for fixed genus there are only finitely many classes of 

interaction structures. In particular the simplest case of genus zero already provides 
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the formalism for certain types of structures that occur in nature and are not covered 
by other filtrations. This case of genus zero interaction structures is aheady of 
practical interest, is studied here in detail and found to be expressed by a multiple 
context-free grammar extending the usual one for RNA secondary structures. We 
show that in 0{n^) time and 0{n^) space complexity, this grammar for genus zero 
interaction structures provides not only minimum free energy solutions but also the 
complete partition function and base pairing probabilities. 

Keywords: RNA interaction structure, topological genus, irreducible shadow, par- 
tition function 



1. Introduction 



RNA-RNA interactions constitute one of the fundamental mechanisms of cellular 
regulation. For instance, small RNAs binding a larger (m)RNA target inc lude: 



the regulat ion of translation in both p r qkaryotes 



eukaryotes 



McManus and Sharp 



chemical modifications 
transcriptional control 



2002!): 



Bachellerie et al. 



Narberhaus and Vogell (120071 ) and 



Baneriee and Slackl (120021), the 



(12 00 21). ins ertion editing 



t argeti ng of 



Bennel ( 11992h and 



Kugel and Goodrich! (120071 ). For a variety of RNA classes 



including miRNAs, siRNAs, snRNAs, gRNAs, and snoRNAs, a salient feature is 
the formation of RNA-RNA interaction structures that are far more complex than 
simple sense-antisense interactions. Accordingly, the ability to predict the details 
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of RNA-RNA interactions in terms of the thermodynamics of binding and in its 
structural consequences is a necessary prerequisite to understanding RNA-based 
regulation mechanisms. The exact location of the binding and the subsequent impact 
of the interaction on the structure of the target molecule has potentially profound 
biological consequences. In case of sRNA-mRNA interactions, such details determine 
whether the sRNA is a positive or negative regulator of transcripti on depending 



on wh e ther binding exp o ses o r covers the Shine-Dalgarno sequence 



f l2007h : 



Sharma et al 



Majdalani et al 



( 120021 ). Effects along these lines have been observed also 



using artificially designed opene r and closer RNAs that regulate the binding of t he 



HuR protein to human mRNAs 



Meisner et al. 



mm -. 



Hackermiiller et al. 



mm . 



An RNA molecule is a linearly oriented sequence of four types of nucleotides, namely, 
A, U, C, and G. This sequence is endowed with a well-defined orientation from the 
5'- to the 3'-end and referred to as the backbone. Each nucleotide can form a 
base pair by interacting with at most one other nucleotide by establishing hydrogen 
bonds. Here we restrict ourselves to Watson-Crick base pairs GC and AU as well 
as the wobble base pairs GU. In the following, base triples as well as other types 
of more complex interactions are neglected. RNA structures can be presented as 
diagrams by drawing the backbone horizontally and all base pairs as arcs in the 
upper halfplane; see Figure [T] This set of arcs provides our coarse-grained RNA 
structure in particular ignoring any spatial embedding or geometry of the molecule 
beyond its base pairs. Accordingly, particular classes of base pairs translate into 
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(A) (B) 

Figure 1. (A) An RNA secondary structure and (B) its diagram representation. 

specific st r uctur e categories, the most p ro minent of which are seconda ry structures 



Kleitmanl ( 119701 ): 



Nussinov et al. 



mm 



Waterman and SmithI (119781 ): 



Waterman 



( 119791 ) ■ When represented as diagrams, secondary structures have only non-crossing 
base pairs (arcs). Beyond RNA secondary struc tures are the RNA ps eudoknot 
structures that allow for cross serial interactions iRivas and Eddvl (119991). Ther e 



are several m e aningful filt rations of cross-serial interactions 



Reidys et al. 



(12011 



Orland and Zeel (120021 ): 



20101 ). Given an RNA coarse-grained structure class together 



with an energy function, "folding" an RNA sequence means to compute a minimuml^ 
free energy configuration (MFE) or a partition function for the sequence. 



RNA interaction structures are structures over two backbones. We distinguish in- 
ternal arcs and external circs cis having their endpoints on the same and different 
backbones, respectively. Interaction structures are represented as two backbones 
with internal and external arcs drawn in the upper halfplane. Alternatively, they 
can be represented by drawing the two backbones on top of each other, see Figure [2j 



with respect to the a priori specified energy function 
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Figure 2. (A) Diagram representation of an RNA-RNA interaction 
structure. (B) The representation of (A) with the two backbones drawn on 
a horizontal hne. 



The simplest approach for folding RNA-RNA interaction structures concatenates 
two (or more) interacting sequences one after another remembering the specific 
merge point (cut-point) and then employs the standard secondary structure fold- 
ing algorithm on a single strand with a slightly modified energy model that treats 
loops containing cut- p oints as externa l elern ents. The software tools RNA c of old 



Hofacker et al. 



and NUPACK 



1994); 



Bernhart et al. 



Dirks et al 



(120061), pairfold 



Andronescu et al 



mm 



(120071 ) subscribe to this strategy. This approach falls 
short predicting many important motifs such as kissing-hairpin loops. The para- 
digm of concatenation has also been generalized to include cross-serial interactions 



Rivas and Eddy! (119991 ). T he resulting model, however, still does not gene rate all rel- 



evant interaction structures 



Chitsaz et al 



(I2009bh : 



Qin and Reidvsl (l2007|). An alter- 



native line of thought, implemented in RNAduplex and RNAhybrid 



Rehmsmeier et al 



( 120041 ). is to neglect all internal base pairings in either strand, i.e., to compute the 



minimum free energy (M FE) secondary structure of hyb ridization of o therwise un- 



structured RNAs. RNAup 



Miickstein et al 



(12006 



20081) and intaRNA 



Busch et al 
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(120081 ) restrict interactions to a single interval that remains unpaired in the sec- 
ondary structure for each partner. As a special case, snoRNA /tar get complexes are 



treated more efficiently using a specialized tool 



Tafer et al. 



(120091 ) due to the highly 



conserved interaction motif. Algorithmically, the approaches m entioned so far are 



close re 



(Il984h: 



by 



atives of the "classica l " RNA folding recursions given by 



Zuker and SankofI 



Waterman and Smith ( 



Pervouchind (120041 ) and 



1978 



) . A differen t approach was taken independently 



Alkan et al 



(120061 ). who proposed MFE folding algo- 



rithms for predicting the AP-structure of two interacting RNA molecules. In this 
model, the intramolecular structures of each partner are pseudoknot-free, the inter- 
molecular binding pairs are non-crossing, and there is no so-called "zig-zag" motif, 
see Sec. [2l The optimal joint structure can be computed in 0{N^) time and 0{N'^) 
space by means of dynamic programming. 



In contrast to the RNA secondary folding problem, where minimum energy folding 
and partition functions can be obtained by similar algorithms, the case of interaction 
structures is more involved. The re ason is that simple unambiguous grammars are 



known for RNA secondary structures 



Powell and Eddy! (120041 ) while the disambigua- 



tion of grammar underlying the Alkan- Per vouchine algorithm requires the introduc- 
tion of a large number of additional non-terminals (which algorithmically translate 
into additional dy namic programming t ables). The p artition functi o n wa s derived 



in dependently 



In 



by 



Huang et al. 



Chitsaz et al. 



(12009 



3 (pi 



RNA) and 



Huang et al 



( 120091 ) (ripl) 



(120101 ). probabilities of interaction regions as well as entire hybrid 
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blocks were derived. Although the partition function of joint structures can also 
be computed in 0{N^) time an d O(N^) space, the current implementations require 



large computational resources. ISalari et al. 



(120091 ) recently achieved a substantial 



speed-up making use of the observation that the exte rnal interactions most ly occur 



Chitsaz et al. 



on the 



between pairs of unpaired regions of single structures, 
other hand, use tree-structured Markov Random Fields to approximate the joint 
probability distribu tion of multiple {> 3) contact regions. The RNA-RNA inter - 



action structures of 



Bernhart et al 



Huang et al 



mm: 



Alkan et al 



mm -. 



Hofacker et al 



(Il994f ) 



((20061) have the following features: 



• when drawing the two backbones on top of each other, all base pairs are 
non-crossing, i.e., no pseudoknots formed by internal or external arcs are 
allowed, 

• zig-zag motifs are disallowed. 

This paper will relax the above constraints and propose a novel filtration of RNA- 
RNA interaction structures based on the topological titration of RNA interaction 
structures. Interaction structures that do not belong to the Alkan-Pervouchine class 
exist: for instance the integral RNA (hTER) of the human telomerase ribonucleo- 
protein has a c onserved secondary structure that contains a potential pseudoknot 



Ly et al 



(120031 ). There is evidence that the two conserved complementary sequences 



of one stem of the hTER pseudoknot domain can pair intermolecularly in vitro, and 



80RGEN E. ANDERSEN\ FENIX W.D. HUANG^ ROBERT C. PENNER^^* AND CHRISTIAN M. REIDYS^* 



that formation of this stem as part of a novel "transpseudoknot" is required for 
the telomerase to be active in its dimeric form. The classification and expansion 
of pseudoknotted RNA structures over on e backbone via t o polog ic al genu s of th e 



associatec. 



Bon et al. 



atgra ph were first proposed by 



fl2008h 



Orland and Zeel (120021 ): 



Pennei] (120041 ): 



In 



Reidys et al. 



(1201 ih 



Zagierl (119951 ). it was proved that for any genus, there are 



only finitely many shadows, i.e., particular , simple at o mic m otifs. In case of genus 



one, these shadows were first presented in 



Bon et al. 



( 120081 ). Shadows give rise to 



a novel stru cture class, natura. 
7-structures 



Reidys et al. 



ly generalizing RNA secondary structures. These 
( 1201 ll ) are generated by concatenation and nesting of ir- 
reducible building blocks of genus < 7. We shall present the topological classifica- 
tion of RNA-RNA interaction structures. This filtration gives rise to the notion of 
7-structures over two backbones. In analogy to their one-backbone counterparts, 7- 
structures over two backbones are composed of irreducible building blocks of genus 
< 7 and have accordingly arbitrarily high genus. We shall see that for any fixed 
genus, there are only finitely many irreducible shadows over two backbones. In par- 
ticular, we study genus zero structures over two backbones. The latter are the two 
backbone analogue of RNA secondary structureqj- 0-structures over two backbones 
already exhibit interesting features not shared with AP-structures, see Figure |3l We 
furthermore derive an unambiguous grammar for 0-structures over two backbones, 
which translates into an efficient dynamic programming algorithm. This grammar. 



^whicli are well-known to be genus zero structures over one backbone 
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C-G- 43 

5' — UUyUCACUUu'^''^CAUUCA — 3'- 

- 3'— AGAAGUGAAAA4'GUAAGU — 5' 

3694 



3705 



3668 



(A) 



C U- 105 

-5 — yAAAAc'^^''^UCCUGUC{N)7 ACA 3' 

3— AUUUUGATUGGACAG— 5'--. 

4522 



4529 



4515 



(B) 



Figure 3 (A) Homo sapie n s ACA27 snoRNA. This H/A CA box RNA 
was cloned Kiss et al. ( 20041 ): Ofengand and Bakin ( 1997 ) from a HeLa 
cell extract immunoprecipitated with an anti-GARl antibody. (B) The 
structure contains two crossing hybrids, which cannot be found in AP- 
structures. 



illustrated in Figure HJ allows the calculation of the minimum free energy, partition 
function and Boltzmann-sampling. It explicitly treats hybrids and gap structures, 
i.e., maximal regions with exclusively intermolecular interactions and maximal re- 
gions with base pairs over one backbone. The grammar thus facilitates the compu- 
tation of the probability of hybrids, the target interaction probability between two 
RNA strands, and the probability of gap structures. 




Figure 4. An unambiguous grammar of RNA-RNA interaction struc- 
tures of genus zero over two backbones. Basic building blocks are: tight 
structures (gray), secondary structures and hybrid structures (A). Only 
tight structures exhibit cross-serial interactions (B) and are further decom- 
posed (C). 
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2. Basic facts 



2.1. Diagrams. A diagram is a labeled graph over the vertex set [n] = {1, . . . , ra} in 
which each vertex has degree < 3, represented by drawing its vertices in a horizontal 
line and its arcs where i < j, in the upper half-plane. A backbone is a sequence 
of consecutive integers contained in [n]. A diagram over b backbones is a diagram 
together with a partition of [n] into b backbones, see Figured] (B). In the following we 
shall denote the set of diagrams over one and two backbones by 3 and E respectively. 

The vertices and arcs of a diagram correspond to nucleotides and base pairs, re- 
spectively. For a diagram over b backbones, the leftmost vertex of each backbone 
denotes the 5' end of the RNA sequence, while the rightmost vertex denotes the 3' 
end. In case of 6 > 1, we shall distinguish two types of arcs: an arc is called exterior 
if it connects different backbones and interior otherwise. Diagrams over b backbones 
without exterior arcs are disjoint unions of diagrams over one backbone. 



he particu lar case 6 = 2 is referred to as RNA interaction structures 



(12009 



Huang et al. 



2010l ). see Figure [2] (A). As mentioned before, interaction structures are of- 
tentimes represented alternatively by drawing the two backbones R and S on top of 
each other, indexing the vertices -Ri to be the 5' end of R and 5*1 to be the 3' of S. 



A zig-zag is defined as follows: given two sequences R and 5, suppose that RaSb, 
(i.e., Ra is base paired with Sb), RiRj, and Si'Sji with i < a < j and i' < b < j'. 
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We say that RiRj is subsumed in Si'Sj', if for any RkSk' & I, i < k < j implies 
i' < k' < j'. Finally, a zigzag is a subgraph containing two dependent interior arcs 
Rij^Rji and Si^Sj^ neither one subsuming the other, see Figure where dependence 
here means that there exists at least one exterior arc Rh.Sg such that ii < h < ji 
and 12 < i < j2- 




Figure 5. A zig-zag structure. RiR^ and S'2<S'5 are dependent interior 
arcs owing to the base pair R3S3, but in view of -R2'S'i and ReS^, neither 
subsumes the other. 



2.2. Prom diagrams to topological surfaces. One approach for deriving mean- 
ingful fi l tratio ns of RNA structure is to pass from diagrams to topological surfaces 



MassevI fll967h 



via fatgraphs 



It is natura l to ma k e this tr a nsitio n from combinatorics to topology 



Penner et al. 



(120101 ): iPennerl (120111 ). A fatgraph G, sometimes also 



called "ribbon graph" or "map", is a graph G together with a collection of cyclic 
orderings, called a fattening, one such ordering on the half-edges incident on each 
vertex. Each fatgraph G determines an oriented surface F(G) as follows: let V{G) 
be the set of G-vertices and E{G) be the set of G-edges. For each v G V{G), con- 
sider an oriented surface isomorphic to a polygon with 2k sides containing v in 
its interior where k is the valence of v. The incident edges of v are also incident to 
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a univalent vertex contained in alternating sides of P^, which are identified with the 
incident half-edges in the natural way so that the induced counter-clockwise cyclic 
ordering on the boundary of agrees with the fattening of G about v. The sur- 
face F(G) is the quotient of the disjoint union L\y^v(G)Pv, where the frontier edges, 
which are oriented with the polygons on their left, are identified by an orientation- 
reversing homeomorphism if the corresponding half-edges lie in a common edge of 
G. This defines the oriented surface F{G), which is connected if and only if G is 
and is uniquely determined in this case by its genus g = g{G) > and number 
r = r{G) > 1 of boundary components. Since F{G) contains G as a deformation 
retract, they share the Euler characteristic v — e, and the genus of F{G) is given by 
2 — 2g — r = V — e. 

For an RNA diagram, we may draw a representation as usual so that the backbone is 
a horizontal line oriented from left to right, and the arcs lie in the upper half-plane. 
This determines a unique fattening on any diagram, cf. the leftmost two panels in 
Figure [6] for the fatgraph and its corresponding surface. Each boundary component 
of F{G) determines a closed edge-path or cycle on G, oriented with the surface lying 
on its left. In particular, a neighborhood of each edge inherits an orientation from 
that of F(G) which combine to give the oriented cycles as depicted in the third 
panel of Figure [61 Without affecting topological type of the constructed surface, 
one may collapse each backbone to a single vertex with the induced fattening called 
the polygonal model of the RNA, as illustrated in the rightmost panels in Figure [61 
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It is the orientation of each backbone from the 5 'end to the 3' end that allows us 
to transform the fatgraph of an RNA-structure or RNA-interaction into a fatgraph 
with one or two vertices. 




Figure 6. (A) The fatgraph of a diagram and its reduction to a single 
vertex. Contracting the backbone of a diagram into a single vertex de- 
creases the length of the boundary components and preserves the genus. 
(B) Inflation of edges and vertices to ribbons and discs, as well as walking 
along the boundary components. Here we have six vertices, seven edges 
and one boundary component. The corresponding surface has Euler char- 
acteristic x = ''^~e = — 1 and g = 1. At the last step, we collapse each 
backbone into a single disc again preserving genus. The backbone of the 
polymer can be recovered by inflating each disk to a backbone segment. 



This backbone-collapse preserves orientation, Euler characteristic and genus by 
construction. It is reversible by inflating each vertex to form a backbone. Us- 
ing the collapsed fatgraph representation, we see that for a connected diagram 
over b backbones, the genus g of the surface (with boundary) is determined by 
the number n of arcs as well as the number r of boundary components, namely, 
2 — 2g — r = V — e = b — n, cf. Figure El 
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Diagrams over one and two backbones are related by gluing, i.e., we have the map- 
ping 

a: E -> ©, 

where (y{E) is obtained by keeping all arcs in E and connecting the 3' end of R and 
the 5' end of S, see Figure [7] (A). 




R S 




R, ' ° S, ' ' R2' '"82' ' R " " '° S 



Figure 7. (A) Mapping a diagram over two backbones into a diagram 
over one backbone by gluing. (B) Mapping from two diagrams over two 
backbones to a diagram over two backbones by concatenating R2 after Ri 
and Si after ^2 preserving the orientation. 

In addition to gluing, there is another operation mapping a pairs of diagram over 
two backbones into a diagram over two backbones: given two diagrams over two 
backbones, Ei,E2 G E we can insert E2 into the gap of Ei by concatenating the 
backbones R2 and Ri and 5*1 and S2 preserving orientation.; see Figure [7] (B). This 
composition is by construction again a diagram over two backbones denoted Ei»E2, 
i.e., we have a mapping 



(2.1) 



/i:ExE — ^E, n{Ei, E2) = El • E2. 
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It is straightforward to see that • is an associative product with unit given by the 
diagram over two empty backbones. The product • is not commutative. 



3. Shadows 

Definition 1. A stack in a diagram is a maximal collection of parallel arcs of the 
form {i + 1, j — 1) , . . . , {i + {i — 1), j — (-^ — !))• An arc is non-crossing if there 

is no other arc in the diagram that crosses it, and a vertex is isolated if it has no 
arcs incident upon it. A shadow is a diagram with no non-crossing arcs or isolated 
vertices so that each stack has size one, and a shadow is non-trivial provided each 
backbone contains at least one paired vertex. 

A diagram determines a shadow by removing all non-crossing arcs, deleting all iso- 
lated vertices and collapsing each induced stack to a single arc as in Figure [HI We 
shall denote the shadow of a diagram X by (t{X), so = cr{X). Projecting into 

the shadow does not affect genus, i.e., g{X) = g{a{X)). In case there are no crossing 
arcs, cr(X) becomes an empty diagram on the same number of backbones as X as 
in Figure [8] (C). By definition, any empty backbone contributes one boundary com- 
ponent. For example, for a diagram X over b backbones that contains no crossing 
arcs, is a sequence of b empty backbones with b boundary components. 
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Figure 8. Shadows: (A) A diagram over one backbone and its shadow 
(B) A diagram over two backbones whose shadow is again over two back- 
bones and (C) a shadow with an empty backbone. 



Let us begin by re fining; an observation about shadows over one backbone from 



Reidys et al. 



mm -. 



Theorem 1. Shadows of genus g > 1 over one backbone have the following proper- 
ties: 

(a) a shadow of genus g contains at least 2g and at most {6g — 2) arcs; in particular 
for fixed genus, there are only finitely many shadows; 

(b) for any 2g < i < 6g — 2, there exists a shadow of genus g containing exactly £ 
arcs. 



TOPOLOGY OF RNA-RNA INTERACTION STRUCTURES 



17 



Proof. First note that if there is more than one boundary component, then there 
must be an arc with different boundary components on its two sides and removing 
this arc decreases r by exactly one while preserving g since the number of arcs is given 
hj n = 2g + r — 1. Furthermore, if there are ui boundary components of length i in 
the polygonal model, then 2n = iui since each side of each arc is traversed once 
by the boundary. For a shadow, i^i = by definition, and z/2 < 1 as one sees directly. 
It therefore follows that 2n = Y.i> ^^i > 3(r - 1) + 2, so 2n = Ag + 2r - 2 > 3r - 1, 
i.e., 4g — 1 > r. Thus, we have n = 2g + {Ag — 1) — 1 = 6g — 2, i.e., any shadow can 
contain at most Qg — 2 arcs. The lower bound 2g follows directly from n = 2g + r — 1 
since r > 1. 

Let be a shadow containing 2g mutually crossing arcs, i.e., each arc crosses any 
of the remaining (2g — 1) arcs. S2g has genus g and contains a unique boundary 
component of length Ag, i.e., traversing 4g non-backbone arcs counted with mul- 
tiplicity. We construct a new shadow S2g+i of genus g containing 2g + 1 arcs, by 
inserting an arc crossing into S2g from the 5' end of S2g such that the boundary 
component in S2g splits into one boundary component of length 3 and another of 
length 4g + 2 — 3 = Ag— 1. The latter becomes the first boundary component of 
S2g+i- The newly inserted arc is by construction crossing, splits a boundary compo- 
nent and preserves genus. We now prove the assertion by induction of the number 
of inserted arcs. By the induction hypothesis, there exists a shadow S2g+i of genus 
g having 2g + i arcs, whose first boundary component has length 4g — i. Again, we 
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insert a crossing arc as described above thereby splitting the first boundary compo- 
nent into one of length 3 and the other of length {4g — {i + 1)). After i = 4g — 2 
such insertions, we arrive at a shadow whose first boundary component has length 2 
while all other boundary components have length 3. Accordingly, there exists a set 
{S2g, S2g+i, . . . , S2g+(4,g-2)} of shadows all having genus g, where each Sj contains j 
arcs, see Figure [91 □ 




Figure 9. Constructing the sequence of sfiadows Se for genus g = 2, see 
Theorem [H for 2g = 4: < i < 6g — 2 = 10. Newly inserted arcs are drawn 
bold. 

Corollary 1. A shadow over two backbones has the following properties: 

(a) a shadow of genus g > 1 over two backbones contains at least {2g + 1) and at 
most 6{g + 1) — 2 arcs; a shadow of genus has at least 2 and at most 4 arcs, in 
particular, the set of such shadows is finite; 

(b) for any {2g + 1) < ^ < Q{g + 1) — 2 m case of g > 1 and 2 < i < 4 in case of 
g = 0, there exists some shadow over two backbones with genus g containing exactly 
i arcs. 

Proof. We first claim that any shadow of genus g over two backbones can be obtained 
by cutting the backbone of a shadow over one backbone having either genus g or 
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g + 1. To see this, suppose we are given a shadow of genus g, having r boundary 
components and n arcs so that 2 — 2g — r = b — n, i.e., g = {2 + n — r — b)/2, where 
b = 1. Cutting the backbone then either sphts a boundary component, or merges 
two distinct boundary components. Since cutting does not affect arcs and increases 
the number of backbones by one we have the resulting genus 

^' = (2 + n-(r + l)-(6+l))/2 = ^- 1 or ^' = (2 + n - (r - 1) - (6 + l))/2 = ^ 

as was claimed. We next observe that a shadow of genus g = over two backbones 
has at least 2 arcs, while the maximum number of arcs contained in such a shadow 
is given by 6(0 + 1) — 2 = 4. For (7 > 1, it is impossible to cut a shadow of 
genus g having 2g arcs and keep the genus. Thus the shadow of genus g over two 
backbones has at least 2g + l arcs. We can always map an arbitrary shadow over two 
backbones of genus g via a into a shadow over one backbone, whence the assertion. 
Theorem [T] guarantees that there are only finitely many such shadows, and the 
corollary follows. □ 

Corollary 2. There exist exactly seven non-trivial shadows over two backbones hav- 
ing genus 0. 

Proof. There exists no non-trivial shadow over one backbone of genus since 0- 
structures over one backbone are secondary structures containing exclusively non- 
crossing arcs. In view of Corollary [H all non-trivial shadows over two backbones 
having genus are therefore obtained by cutting the backbone of shadows of genus 1 
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over one backbone. By inspection, there are seven possible such cuts as in Figure [TOl 

□ 

-/SSX -CQSX .^zO^ 
.rs^^ .as^^ .£?Qa izO^ 

A C D E F G 

B 

Figure 10. The shadows over two backbones having genus obtained 
by cutting the four shadows of genus 1 over one backbone. 

4. Irreducibility 

Definition 2. A diagram E over b backbones is called irreducible if and only if it 
is connected and for any two arcs, ai, au contained in there exists a sequence of 
arcs (ai, 0^2, . . . , a/c-i, ak) such that (a^, aj+i) are crossing. 

We proceed by refining Theorem [D 

Corollary 3. An irreducible shadow having genus g = over two backbones contains 
at least 2 and at most 4 arcs, and for and 2 < i < 4, there exists an irreducible 
shadow of genus g = over two backbones having exactly i arcs. An irreducible 
shadow having genus g > 1 has the following properties: 

(a) every irreducible shadow with genus g over two backbones contains at least 2g + l 
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and at most 6{g + 1) — 2 arcs; 

(b) for arbitrary genus g and any 2g + 1 < i < 6g — 2, there exists an irreducible 
shadow of genus g over one backbone having exactly i arcs. 



Proof. Part a) follows directly from Theorem [H and for b), the shadows S2g+i, ■ ■ ■ , SQg_2 
generated in the proof of Theorem [H are in fact irreducible as in Figure |9l □ 

Definition 3. Let X be a diagram. We call S' an irreducible shadow of X (irre- 
ducible X-shadow) if and only if S' is an irreducible shadow and any arc in S' is 
contained in X. Let I{X) = {S' C X | S" is an irreducible X-shadow }. 



Cl early, our not i on o 
of 



Reidys et al. 



mm 



i rreducibility reco vers for diagrams over one backbone that 



Bon et al. 



(|2008[ ). A diagram D over one backbone can it- 
eratively be decomposed by first removing all non-crossing arcs as well as isolated 
vertices and second by removing irreducible Z)-shadows iteratively as follows: 

• one removes (i.e., cuts the backbone at two points and after removal merges the 
cut-points) irreducible D-shadows from bottom to top, i.e., such that there exists 
no irreducible S'-shadow that is nested within the one previously removed. 

• if the removal of an irreducible D-shadow induces the formation of a non-trivial 



stack as in Figure [TTl then it is collapsed into a single arc. 
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cut and glue 




Figure 11. Removing irreducible shadows from "bottom to top". Any 
stacks, that are induced by these removals are collapsed into single arcs. 



We ne xt extend the decomposition of diagrams over one backbone 



Reidys et al. 



( I2OIII ) to diagrams over two backbones. Let E he a. diagram over two backbones. 
By definition, irreducible E'-shadows over two backbones are either connected or 
a disjoint union of two irreducible shadows over one backbone. Thus, E can be 
decomposed by removing first all non-crossing arcs as well as any isolated vertices 
and second all irreducible i?-shadows in two rounds as follows: 

• remove any irreducible ii^-shadows over one backbone, from bottom to top, as 
previously described, see Figure [T21 

• remove the irreducible E'-shadows over two backbones iteratively, starting with 
the irreducible E'-shadow containing the leftmost vertex of the second backbone, see 
Figure 





Figure 12. Decomposition of a shadow over two backbones. First, from 
bottom to top, the only irreducible shadow over one backbone is removed. 
During its removal, a stack of length two is induced (bold arcs), which is 
projected into a single arc. Second, the two irreducible shadows over two 
backbones are iteratively removed. 
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5. 7-STRUCTURES OVER TWO BACKBONES 



Definition 4. A diagram X over h backbones is a 7-structure over h backbones if 
and only if we have g{S') < 7 for any irreducible X-shadow 5". 

With foresight, we refine the notion of irreducible X-shadow as follows: 

= { 5" I 5' is an irreducible i?-shadow over one backbone }, 
f'2{E) = {S' \ S' is an irreducible iJ-shadow over two backbones, where g{a{S')) = g{S') + i 

Lemma 1. Suppose E is a 'j -structure over two backbones. Then 



(5.1) 



{g{S') + l), tf I°(i?)7^0; 





Proof. By construction, a{E) is a shadow over one backbone consisting of irreducible 
components of genus at most 7 + 1. Thus, a{E) is a (7 + l)-structure and 



(5.2) 



g{a{E)) 



S'€li{E) S'&°{E) S'&l{E) 



Let Si = E>i{E) be the set of E'-subshadows over two backbones where the backbones 
are on the same boundary component and let §2 = ^2{E) be those that are not. We 
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have 



(5.3) giS') 



g{aiS')), iff S'eME); 
g{a{S'))-l, iff 5'G§2(^). 

Claim 1. Suppose I^^i^) = then 

(5.4) g{E)= 9{S')+ Yl (^(^0 + 1) - 1- 

S'&i{E) S'&l(E) 

To prove this, we use the operation 5*1 • 5*2 G §2- By associativity of •, we conclude 
that E has both backbones on the same boundary component, i.e., 

(5.5) giE) = giaiE))-l, 

and in view of eq. (15.21) . Claim 1 follows. 
Claim 2. li^E) ^ 0, then 

(5.6) g{E)= J2 9{S')+ Yl (9{S') + l). 

We claim that I^l-^) 7^ ^ implies g{E) = g{a{E)). Indeed, I'^i-E) ^ guarantees 
that there exists some irreducible shadow 5*0 G l2(-^)- definition the prop- 

erty 5'(a(S'o)) = giS'o), i.e., gluing the two Sg-backbones does not merge boundary 
components, whence S'q G Si. Now, at some point 5*0 appears as a factor in the 
shadow of E which implies G Si. Accordingly, we have g{E) = g{a{E)), from 
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which it follows that 

(5.7) g{E)= J2 9{S')+ Yl (9{S') + l). 



□ 



6. A GRAMMAR FOR 0-STRUCTURES OVER TWO BACKBONES 



In this section, we develop an unambiguous decomposition grammar for 0- 
structures over two backbones or 02-structures. 02-structures map via a into 1- 
structures over one backbone of genus zero or one. In order to formulate let us 
recall that we draw the oriented backbones R and S horizontally and consecutively 
starting with the 5' end of R or Ri and ending with the 3' end of 5* or 5*1. We 
denote a structure over two backbones by J^j-hp where i, j are vertices contained 
in R and h, i are contained in S. In particular, we shall write for a single ver- 
tex letting — 1] represent an "empty" backbone. For instance, denotes 
the structure over one backbone on the interval [h, i] on S, where h < £, Jlj-h^^-i 
denotes the structure over one backbone on the interval [i,j] on R, where i < j, and 

Ji,i-l,h,h-l — ^■ 

The key building blocks of are the following: 
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• gap- structures: a gap structure Jj^j-h/ a secondary structure over with 
a gap from j to h such that and (j, h) are base pairs; within the two 
gaps, there are no crossing arcs. 

• hybrid- structures: a hybrid structure J^^\^ j^ is a maximal sequence of 
intermolecular interior loops consisting of exterior arcs Ri^Sj^,...,Ri^Sj^ 
where Ri^^Sj^^ is nested within Ri^^^^Sj^^^^ and where the internal segments 
R[ih + 1, ih+i — 1] and S[ih + 1, j^+i — 1] consist of single-stranded nucleotides 
only; that is, a hybrid structure (hybrid) is the maximal unbranched stem- 
loop formed by external arcs. 

• tight structures: a tight structure (TS) Jfj-^i is a structure in which the 
four positions, i, j, h and £ are endpoints of an irreducible shadow over two 
backbones. 

• pre-tight structures: a pre-tight structure (PTS) is a structure JiJ^hp '^o^" 
taining a tight structure Ji^,j-hi,e or a hybrid structure J^^H.3;hi,f fQj^ some 
ii > i and hi > h. 

Now we are in position to formulate the production rules of '^q, detailed in Figure [T51 

(1): given an arbitrary structure Jij-h i, we remove starting from j and £ secondary 
structure blocks until an exterior arc is encountered; such an exterior arc is con- 
tained in a pre-tight structure and otherwise, Jlj.^i contains no exterior arc and 



TOPOLOGY OF RNA-RNA INTERACTION STRUCTURES 



27 



Blocks: (A) (B) (C) (D) (E) (F) (G) (H) (I) (J) 




Figure 13. The grammar (A): a secondary structure over (B): 
a tight structure Jjj-r,s^ (C): a gap structure Ji^j-r,s o'^^^ '^^^ backbone, 
(D): a substructure of a gap structure Jf^'^ g such that {i, s) and (j, r) are 
interior arcs but itself is not a maximal gap structure, (E): a substructure 
Jl^j^r s consist of hybrid structures and secondary structures, (F): a hybrid 
structure (^) ^ substructure J^^j^r s hybrid structure such that 

and (r, s) are exterior arcs but itself is not a hybrid structure because 
it is not maximum, (H): an arbitrary structure on two backbones, (I): a 
pre-tight structure Jf^j^r (J)' open structure consisting of unpaired 
bases, (1)— (8): decomposition rules for the previously defined blocks. 

thus decomposes into two disjoint secondary structures; 

(2): the decomposition of pre-tight structures JiJ^h/- if RjSi is an exterior arc, then 
it is decomposed into a hybrid J^j.^r^ i and an arbitrary substructure Jli^-i-h^h^-i-, 
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otherwise, it is decomposed into a tight structure J^j^j.hi,£ arbitrary structure 

(3) : in case of tight structures depending on which type of shadow is contained 
in the tight structure, there are 7 ways to disect into maximal gap structures and 
hybrid-structures (which in turn collapses into interior and exterior arcs of the irre- 
ducible shadow, respectively), as well as secondary structures; 

(4) : a substructure J^j^f^ £ consists of hybrids and secondary structures, where each 
hybrid structure is maximal.; 

(5) : a maximal hybrid structure J^-'.^g^ is decomposed into an exterior arc RiSh and 
a non-maximal hybrid structure Jj^^\hi i "with i < ii < j and h < hi < i; 

(6) : a non-maximal hybrid structure Jj^J-h/ decomposed into an exterior arc RiSh 
and a non-maximal hybrid structure J-^ ^.^^ ^ with i < ii < j and h < hi < i.; 

(7) : a maximal gap structure Ji^j-h,£ decomposed via the context-free grammar for 
secondary structures assuming that there is a virtual hairpin loop in [j, h]; note that 
the substructure decomposed by a maximal gap structure is no longer maximal; we 
use Ji^j-h,£ denote such a non-maximal gap structure derived via this decomposi- 
tion; 

(8) : a non-maximal gap structure J^j.^i is decomposed similarly to the decomposi- 
tion of a maximal gap structure. 

Lemma 2. Any ^-structure over two backbones can uniquely be decomposed via '^q, 
and any diagram generated by cl ^-structure over two backbones. 
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Proof. First, we show that a 02-structure can uniquely be decomposed into blocks 
containing exclusively non-crossing arcs. We shall establish this by induction on the 
number of its irreducible shadows. 

Induction basis: any 02-structure over two backbones that contains no shadow of 
genus zero over two backbones exhibits no crossing arcs. Namely, it contains only 
blocks that are either secondary structures or hybrids. Accordingly, such a structure 
can be decomposed uniquely via the context-free grammar of secondary structures 
or the unique decomposition of hybrid-structures. 

Induction step: Suppose is a 02-structure containing m > 1 irreducible shadows 
over two backbones of genus 0. We decompose from "inside to outside", i.e., from 
the 3'-end of R and the 5'-end of S. Suppose we encounter a substructure S which 
collapses into an irreducible shadow over two backbones of genus 0. S itself deter- 
mines a unique maximal tight structure, Ts, such that cr{Ts) = S. Removing Ts 
from Em. yields a diagram Em~i over two backbones containing m — 1 irreducible 
shadows over two backbones of genus 0. The induction hypothesis guarantees the 
unique decomposition of Em~i via ^q- 

It remains to show how to decompose tight structures: the shadow of a tight struc- 
ture is by construction irreducible and is given by one of the seven irreducible shad- 
ows over two backbones described in Corollary [2l In order to decompose a tight 
structure, we dissect it into maximal gap structures and hybrid-structures (which in 
turn collapse into interior and exterior arcs of the irreducible shadow, respectively), 
as well as secondary structures. All of these elements are ^fo-blocks that do not 
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contain any crossing arcs and can therefore be decomposed via a modified version 
of the context-free grammar of secondary structures, described above. Accordingly, 
there are seven ways to uniquely decompose a tight structure into blocks containing 
exclusively non-crossing arcs. 

Finally, we show that generates only 02-structures. By construction, con- 
structs tight structures via secondary structure blocks, gap-structures and hybrid- 
structures. It furthermore generates via the insertion of secondary structure blocks, 
hybrid structures and tight structures. Thus, any structure generated by is a 
02-structure, whence the lemma. □ 

Theorem 2. The grammar '^q has the following properties: 

(a) ^5 unambiguous; 

(b) ^0 allows computation of the partition function, base pairing probabilities, the 
probability of hybrid-blocks, gap- structures and Boltzmann sampling of 02-structures, 

(c) ^0 has a time 0{n^) and space O(n^) complexity for generating the partition 
function of 02-structures. 

Proof. Assertion (a) follows from Lemma [2J Consequently, can be employed to 
count 02-interaction structures over two backbones for given sequences R and S as 
well as to compute the partition function 
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of 02-structures, where R is the universal gas constant, T is the temperature, G{s) is 
energy of structure s over sequence x, and Z%e is the set of 0-interaction structures 
in which all base pairs satisfy the base pairing rules for RNA, i.e., {i,j) G 

{AU, UA, GC, CG, GU, UG}. 

As for assertion (b), let Nij-h^i denote the substructure represented by the nontermi- 
nal symbol in over [i,j] and [h, I], where A^ = {/, PT, T, Hs, Hy, Hy*, G, G*}. 
Note that secondary structures are presented by an arbitrary structure / setting 
one backbone empty. For each of these symbols, we introduce corresponding partial 
partition functions Qn^j.^- Since 5fo is unambiguous, the recursions for the partial 
partition functions are derived by replacing minima by sums and addit ion of en- 
ergy c ontribution by multiplication of partial partition functions, see e.g 



Vofi et al. 



( I2OO6I ). For instance, the recursion for the partition functions corresponding to the 



nonterminal symbol PT reads 

QjpT = Qj' ^ QjT + Qji y< Q jHy 

ki,k2 k\,k2 

The probabilities Ptv^j.^^ of partial substructures of type A^ are readily calculated 
from the partial partition fu n ction s. These "backward recursions" are analogous to 



those derived by 



McCaskilll (Il990[ ) for secondary structures without crossings. It 



follows that we have 

where the sum is over all 02-interaction structures containing Nij-h,^ 
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Suppose Nij-h/ is obtained by decomposing Og. The conditional probabilities Pat- ^\g^ 
are then given by QeX^i,j;h/) /Qos^ where Qe^ represents the partition function of Og 
and Qes{Ni,j;h,i) represents the partition functions for those 6's-configurations that 
contain Nij-h/- Taking the sum over all possible Og, we obtain 



Qe 



From this backward recursion, one immediately derives a stochastic backtracing re- 
cursion from the probabilities of p artial structur e s that generates a Boltzmann sarn 



pie of 0-interaction structures; see 



Tacker et al. 



Huang et al. 



fll996l ): bing and Lawrencel (120031 ) 



(120101 ) for similar constructions. The basic data structure for this sam- 
pling is a stack A which stores blocks of the form (z, j; r, s, A^), presenting interaction 
substructures of nonterminal symbols A^. L is a set of base pairs storing those re- 
moved by the decomposition step in the grammar. We initialize with the block 
(1, n, /) in A, and L = 0. In each step, we pick up one element in A and decompose 
it via the grammar with probability Q^^ /Q^ ■, where is the partition function 
of the block which is picked up from A, and Q^^ is the partition function of the 
target block which is decomposed by the rewriting rule. The base pairs which are 
removed in the decomposition step are moved to L. For instance for the decomposi- 
tion rule of JiJ^hp decomposing block (i, j, PT) into the two blocks: (i, ki] h, k2, 1) 
and {ki + k2 + l,i,T), for fixed indices ki, k2, the probability of decomposing 
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PT) reads 

Qji X QjT 



P" i,ki;h,k2 ki + l,j;k2 + l,l 

k - 



The samphng step is iterated until A is empty. The resulting 02-interaction structure 
is given by the list L of base pairs. The probability of hybrid-structures can be 
calculated since a 



Huang et al. 



lybrid structure is by construction a block in the grammar, see 
(j2010l ). The probability of interactions involving a fixed interval [i-,]] 
is given by 

ptarget _ TU)Hy 
h,£ 

A gap structure, representing a maximal non-crossing stem on either backbone is also 
a ^o-block, whence its probability is readily computable. Similarly, the probability 
of parings within the same backbone for a fixed interval [i,j] can be expressed as: 

In order to prove assertion (c), we observe that any product of two blocks has 0{n^) 
time complexity. We conclude from this that all ^o-rules, except for (3) and (4) are 
of 0{n^) time complexity. It thus remains . To this end, we 

introduce intermediate blocks whose function is transitional storage. 

1. Jfj-hi stores the result of the product ^/^f/j/j^ and two secondary structure 
over interval [ii + 1, j] and [hi + 1, P\ with i < ii < j and h < hi < £. 



''which are in fact 0{n^^) for (3) and 0{n^) for (4) time complexity as it stands 
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2. Jj^j-h/ stores the result of the product J^i-^-h^^^ and two secondary structure 
over interval [ii + 1, j] and [h + 1, /ii] with i < ii < j and h < hi < i. 

3. Jij-h,e stores the result of the product Jfiyj^j and J^Jlij^^i-he, with i < ii < 
Ji < J- 

4- J^^rM stores the result of the product Jj:^i^.h,,£ and J^J^j.^^^^^.^ with i < h < ] 
and h < hi < i. 

5. Ji^j-hi stores the result of the product JY,iyj^,j and Ji^+i j^^^i-h/ with i < ii < 
Ji < J- 

By virtue of these new blocks, we may rewrite (3) and (4) in terms of (3') and (4') 
as displayed in Figure [TH After including these five intermediate blocks, we obtain 

^-^T^ ^-^T^ 
- ^ = ^ " ^ ™ ^ »' ^^^^ 




Figure 14. The decomposition of Jfj.^i and J^j^f^i via the five inter- 
mediate blocks J^j..^^^, J^j.f^^^, Jff-.^^^, J^..^^^ and Jj-.^^^^. They allow the 
decomposition of Jfj.f^i and J^j^.y^i with O(n^) time complexity. 
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two additional, nonterminal symbols in each decomposition rule. Since it requires 
two free variables to have the product of two nonterminal symbols and at most four 
variables to describe the two blocks, the decompositions in this form are of O(n^) 
time complexity. We use at most 4-dimensional matrices to store the blocks in 
whence the 0{n'^) space complexity. □ 



7. Discussion 



In this paper, we have introduced the toplogical filtration of RNA interaction struc- 
tures and developed the notions of shadows, irreducibility and 7-structures for them. 
Shadows are of particular importance for the minimum free energy folding since they 
represent the basic motifs of genus g. Since we have proved that for any genus there 
are always finitely many such shadows, it is therefore in principle possible to assign 
them individual energies, which would presumably lead to high specificity. 



The simplest topological class of RNA interaction structures is that of 0-structures 
over two backbones. This is the two-backbone analogue of the classical RNA 
secondary structures. Despite their simple irreducible shadows (Corollary [2]), 0- 
structures oyer two backbones exhibit features not present in the AP-structures of 



Pervouchind (120041 ) 



Alkan et al. 



(120061 ). Namely, they allow for pseudoknots formed 
by exterior circs cLS reported, for instance, in Homo sapiens ACA27 snoRNA, see Fig- 
ure [31 and Figure [13 
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Let US next compare AP-structures and 0-structures over two backbones in more 
detail. Recall that an AP-structure, J(-R, S, I), is a graph such that 



(1) R, S are secondary structures, 

(2) / is a set of exterior arcs without external pseudoknots, 

(3) J{R, S, I) contains no zig-zags. 



A tight AP-str ucture (R(TS ) ) is a substru cture that cannot be decomposed via block 



decomposition 



Huang et al. 



( 2009 



20101). Accordingly, the shadow of a R(TS) is 



connected and hence irreducibile. R{TS) and tight structures of 0-structures over 
two backbones are distinct concepts. We have already observed that 0-structures 
over two backbones are not contained in the set of AP-structures. Likewise, AP- 
structures are not contained in the set of 0-structures over two backbones, for exam- 
ple, consider a shadow of a 0-structure over two backbones which consist of 3 < x 
distinct, irreducible shadows over two backbones having genus 0. According to 
Lemma. [H the genus of this diagram is x — L Drawing an interior arc covering the 
i?-endpoints of these x shadows tightly, the resulting diagram is by construction a 
R{TS) as in Figure [151 As inserting a single arc changes the genus at most by one, 
the diagram, R(TS), has genus > 1, has an irreducible shadow and is consequently 
not a 0-structure over two backbones. 
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Figure 15. (A): a 0-structure over two backbones that is not an AP- 
structure; the crossing hybrid. (B): an AP-structure that is not a 0- 
structure over two backbones; this structure contains an irreducible shadow 
over two backbones of genus 1. 
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